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Abstract Genome sequence analysis of Middle East respiratory syndrome coronavirus 
(MERS-CoV) variants from patient specimens has revealed the evolutionary dynamics and 
mechanisms of pathogenesis of the virus. However, most studies have analyzed the 
consensus sequences of MERS-CoVs, precluding an investigation of intrapatient 
heterogeneity. Here, we analyzed non-consensus sequences to characterize intrapatient 
heterogeneity in cases associated with the 2015 outbreak of MERS in South Korea. Deep¬ 
sequencing analysis of MERS-CoV genomes performed on specimens from eight patients 
revealed significant intrapatient variation; therefore, sequence heterogeneity was further 
analyzed using targeted deep sequencing. A total of 35 specimens from 24 patients 
(including a super-spreader) were sequenced to detect and analyze variants displaying 
intrapatient heterogeneity. Based on the analysis of non-consensus sequences, we 
demonstrated the intrapatient heterogeneity of MERS-CoVs, with the highest level in the 
super-spreader specimen. The heterogeneity could be transmitted in a close association 
with variation in the consensus sequences, suggesting the occurrence of multiple MERS- 
CoV infections. Analysis of intrapatient heterogeneity revealed a relationship between 
D510G and I529T mutations in the receptor-binding domain (RBD) of the viral spike 
glycoprotein. These two mutations have been reported to reduce the affinity of the RBD 
for human CD26. Notably, although the frequency of both D510G and I529T varied 
greatly among specimens, the combined frequency of the single mutants was consistently 
high (87.7% ± 1.9% on average). Concurrently, the frequency of occurrence of the wild 
type at the two positions was only 6.5% ± 1.7% on average, supporting the hypothesis 
that selection pressure exerted by the host immune response played a critical role in 
shaping genetic variants and their interaction in human MERS-CoVs during the outbreak. 
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INTRODUCTION 


Middle East respiratory syndrome coronavirus (MERS-CoV) was first isolated from a patient in 
Saudi Arabia in 2012 and has been shown to cause severe acute respiratory illness, including 
fever, cough, and shortness of breath (Zaki et al. 2012). As of March 1,2016,1638 laboratory- 
confirmed cases (587 deaths; 36% case fatality rate [CFR]) have been reported to the World 
Health Organization. A South Korean outbreak of MERS began in May 2015, and its transmis¬ 
sion continued until early July, resulting in 186 laboratory-confirmed cases with 38 deaths 
(20.4% CFR). In contrast to previous studies, which have suggested limited person-to-person 
transmissibility of MERS-CoV (Breban et al. 2013; Cotten et al. 2013b), many secondary and 
tertiary cases of transmission occurred during the South Korean outbreak. Importantly, more 
than half of the tertiary cases were transmitted from one particular super-spreader, called 
Patient 14 in this study. This unusual transmission pattern raised questions related to trans¬ 
missibility as well as the potential adaptations of MERS-CoV to the human host. To address 
these questions, several researchers have investigated MERS-CoV sequences. However, all 
previous studies on the South Korean outbreak have focused only on the consensus se¬ 
quences of MERS-CoVs (Wang et al. 2015; Kim et al. 2016a,b; Park et al. 2016; Seong 
et al. 2016). 

The evolutionary dynamics of RNA viruses are complex because of their high mutation 
rates, rapid replication rates, and large population sizes. Multiple rounds of replication of 
a given viral genome can generate a cloud of diverse variants or a heterogeneous viral pop¬ 
ulation. Previous reports on other RNA viruses have suggested that quasispecies diversity, 
ratherthan the selection of individual variants, correlates with pathogenicity and enables ad¬ 
aptation to new environments (Vignuzzi et al. 2006; Lauring and Andino 2010). The impor¬ 
tance of characterizing viral populations as swarms with similar variants has led several 
groups to use next-generation sequencing (NGS) technologies on clinical samples to ex¬ 
plore the complexity of the spectrum of mutant viruses, including HIV-1 and hepatitis B virus 
(Bushman et al. 2008; Eriksson et al. 2008; Margeridon-Thermet et al. 2009). Although the 
description of MERS-CoV quasispecies is far from completion, previous studies that recov¬ 
ered whole MERS-CoV genome sequences from dromedary camels revealed the existence 
of intrahost single-nucleotide variants (Briese et al. 2014; Borucki et al. 2016). 

To date, no intrahost MERS-CoV single-nucleotide variants have been identified in hu¬ 
mans, except by Cotton et al. (2013a), where data from one specimen showed the presence 
of non-consensus variants (<5%). Currently, it is hard to discern whether only consensus se¬ 
quences have been reported or whether human MERS-CoV sequences represent almost 
clonal virus populations within individual cases. By analyzing non-consensus sequences in 
35 specimens from 24 patients infected during the South Korean outbreak (2015), we dem¬ 
onstrated the existence of intrapatient heterogeneity and investigated its functional 
implications. 


RESULTS 


Patients and Sample Collection 

Among the 24 cases, 14 were patients, four were caregivers, and six were health-care work¬ 
ers (HCWs) (Table 1). Patient 14 was a second-generation case who had been exposed to the 
index patient. Exposure to Patient 14 led to a second wave of the outbreak, resulting in a 
total of 81 third-generation infections (Oh et al. 2015). Specimens from 20 of these third-gen¬ 
eration cases were analyzed in this study (Supplemental Fig. SI). Patients 162, 164, and 169 
were HCWs and fourth-generation case patients who had been exposed to Patient 135. 
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Table 1. 

Characteristics of patients included in 

this study 



Patient 

no. 

Age/ 

sex 

Case 

type 

Disease 

severity 

Underlying disease 

Transmission 

Source 

Generation patient 

14 

35/M 

Patient 

Severe 

None 

2nd 

1 

48 

37/M 

Caregiver 

Moderate 

None 

3rd 

14 

50 

80/F 

Patient 

Severe 

Posterior cerebral artery 
infarction 

3rd 

14 

61 

55/M 

Caregiver 

Severe 

None 

3rd 

14 

62 

30/M 

HCW 

Mild 

None 

3rd 

14 

66 

42/F 

Patient 

Moderate 

Myelodysplastic syndrome 

3rd 

14 

68 

54/F 

Caregiver 

Moderate 

None 

3rd 

14 

75 

62/M 

Patient 

Moderate 

Rectal cancer 

3rd 

14 

77 

63/M 

Patient 

Severe 

Necrotizing pancreatitis 

3rd 

14 

78 

41/F 

HCW 

Moderate 

None 

3rd 

14 

80 

34/M 

Patient 

Severe 

Malignant lymphoma 

3rd 

14 

99 

47/M 

Caregiver 

Moderate 

None 

3rd 

14 

100 

32/F 

Patient 

Moderate 

Neuroendocrine tumor 

3rd 

14 

101 

84/M 

Patient 

Severe 

Renal cell carcinoma 

3rd 

14 

102 

48/F 

Patient 

Moderate 

None 

3rd 

14 

103 

65/M 

Patient 

Moderate 

Middle cerebral artery 
infarction 

3rd 

14 

134 

67/F 

Patient 

Mild 

None 

3rd 

14 

135 

32/M 

HCW 

Severe 

None 

3rd 

14 

155 

42/F 

Patient 

Mild 

Hypertrophic 

cardiomyopathy 

3rd 

14 

157 

59/M 

Patient 

Severe 

Lung cancer 

3rd 

14 

162 

32/M 

HCW 

Severe 

None 

4th 

135 

164 

35/F 

HCW 

Moderate 

None 

4th 

135 

169 

33/M 

HCW 

Moderate 

None 

4th 

135 

177 

49/F 

Patient 

Severe 

Malignant lymphoma 

3rd 

14 


HCW, health-care worker. 


To determine whether viral genomic variability was related to disease severity, patient 
cases were categorized as mild (symptomatic cases without pneumonia; n = 3), moderate 
(cases with pneumonia; n = 11), and severe (cases with respiratory failure or death; n = 10) 
(Table 1). In eight cases, multiple lower respiratory tract specimens were obtained 
(Supplemental Table SI). Specimens from patients 50, 77, and 135 were obtained before 
and after respiratory failure. 


Characterization of MERS-CoV Genome 

Specimens from eight patients who had been identified as MERS-CoV-positive were se¬ 
quenced. The samples were found positive by both upstream-of-the-envelope gene (upE) 
and open reading frame la ( orfla ) real-time polymerase chain reaction (PCR) assays, with 
cycle threshold (Ct) values averaging 17.3 (15.4-22.6) and 18.3 (15.4-23.0), respec¬ 
tively (Supplemental Table SI). For each patient, a MERS-CoV consensus sequence that 
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indicated the major nucleotide at each genomic position was generated. The genome se¬ 
quences of the eight isolates differed from each other at only seven positions (Table 2). 
The nucleotide substitutions were all nonsynonymous and occurred in the orflab (n = 3) 
and S (n = 4) genes (Table 2). The sequences of the eight isolates had high nucleotide iden¬ 
tities (ranging from 99.96% to 100%, with 99.98% to 100% sequence identities in ORF 1 a and 
1 b [orflab] and 99.85% to 100% identities in the spike glycoprotein gene) with recently pub¬ 
lished sequences of MERS-CoV isolates from the outbreak in South Korea (Kim et al. 2015; Lu 
et al. 2015; Seong et al. 2016). The E, M, and N genes were 100% identical to the previously 
described MERS-CoV isolates from South Korea. The sequence from Patient 14 in this study 
differed from the sequence from a previous study at two positions (Seong et al. 2016). 
Because the frequencies of the major nucleotides at these positions were close to 50% in 
this study, the differences between the consensus sequences may have arisen from small 
technical variations in measurement. 

Using 105 previously published genome sequences of MERS-CoVs, including those from 
recent cases in South Korea, we analyzed the phylogenetic relationships of eight newly se¬ 
quenced isolates. The new sequences clustered together with the other MERS-CoV isolates 
from the 2015 South Korean outbreak (Supplemental Fig. S2; Kim et al. 2015; Lu et al. 2015; 
Seong et al. 2016). 


Presence of Intrapatient Heterogeneity in MERS-CoVs 

Analyzing the MERS-CoV genome sequences, we noticed that a significant number of nu¬ 
cleotide sites had mixed bases. Because deep sequencing not only generates a consen¬ 
sus sequence but also identifies non-consensus nucleotides at each position, Cotton et al. 
(2013a) analyzed nucleotide variants present at >1% frequency to determine the variants 
important for intrahost evolution. In the previous study, the nucleotide variants present at 
a frequency greater than the sequencing error rate (i.e., 1%) were considered to be true 
variants, although this estimate may be conservative for most applications. Flowever, er¬ 
rors introduced during reverse transcription and PCR amplification can significantly distort 
estimates of allele frequencies, especially if a limited amount of RNA is used as an input. 
Thus, in this study, we evaluated the reproducibility of allele frequencies of non-consen¬ 
sus nucleotides. For this purpose, duplicate libraries generated by independent comple¬ 
mentary DNA (cDNA) synthesis using the same RNA samples were sequenced and 
compared. Allele frequencies of duplicate samples from Patients 80 and 162 were signifi¬ 
cantly correlated, implying that the majority of mixed bases were a reflection of intrapa¬ 
tient heterogeneity rather than a result of sequencing errors (Fig. 1A,B). 

Despite the relatively high reproducibility, we focused on nucleotide variants present at a 
relatively high frequency (i.e., >30%) to minimize false positives among non-consensus var¬ 
iants. Positions where a nucleotide variant was present at a frequency of >30% in any of the 
eight isolates were selected for further investigation using additional samples. Targeted 
deep sequencing was performed on an additional 27 samples targeting 3003-bp regions 
that included the variable sites. Eleven libraries from independent cDNA syntheses of five 
samples, including four libraries in duplicate and one in triplicate, were constructed to eval¬ 
uate the technical noise in measuring allele frequency. Only variants with frequencies signifi¬ 
cantly greater than those from technical noise were considered true variants. Technical noise 
is described in Methods and was based on a 5% significance level after performing 
Bonferroni adjustment. A total of 16 positions displaying intrapatient heterogeneity were 
identified through statistical testing. The results from seven specimens were validated using 
Sanger sequencing. We tested nine sites and confirmed concordant mixed bases at seven 
positions—namely, positions 6884, 7317, 11257, 21726, 22356, 22984, and 23041 (Fig. 
1C), but found discrepancies at positions 7322 and 19075. Although we were not able to 
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N, nonsynonymous substitution; S, synonymous substitution. 
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Figure 1. Reproducibility assessment for the detection of non-consensus variants. Correlation of allele fre¬ 
quencies between sample replicates from (A) Patient 80 and (B) Patient 162. (C) Representative validation 
by the Sanger sequencing method. 


validate all variants by Sanger sequencing owing to a limited sample volume, the results from 
the test set showed a 78% validation rate. In addition, the frequencies of the two variants at 
positions 22984 and 23041 (i.e., G at position 22984 and T at 23041; A at 22984 and C at 
23041) were highly correlated in the specimens (Supplemental Fig. S3), suggesting a tight 
linkage between these variants instead of random sequencing errors. 

Fourteen positions exhibiting mixed bases are presented in Figure 2. These include all 
seven nucleotide variants identified in the full-genome sequences of the eight MERS-CoV 
isolates, indicating that intrapatient MERS-CoV heterogeneity was closely associated with 
isolate-specific variants in the consensus sequences. Taken together, our data demonstrate 
the intrapatient heterogeneity of MERS-CoVs. 


Interactions Between Genetic Variants 

As described above, there was a tight linkage between positions 22984 and 23041—the 
variant at one position and the wild type at the other position or vice versa (i.e., D510G 
and 1529; D510 and I529T). Based on the genetic linkage, a functional relationship was 
expected between the variants at these two positions. In fact, these two mutations were 
located in one of the two major binding patches in the receptor-binding domain (RBD), 
where escape mutations from neutralizing antibodies were previously reported (Tang 
etal. 2014). 

Based on the tight correlation of base frequencies at positions 22984 and 23041, our 
data suggested that the double mutant carrying D510G and I529T is rare. Therefore, we 
selected sequencing reads covering both positions in each specimen to measure the 
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Patient 

14 

48 

50 

61 

62 

66 

68 

75 

77 

78 

80 

99 

100 

101 

102 

103 

134 

135 

155 

157 

177 

162 

164 

169 

Plausible 

source 

1 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

14 

135 

135 

135 

Disease 

severity 

Sev 

Mod 

Sev 

Sev 

Mild 

Moc 

Mod 

Moc 

Sev 

Mod 

Sev 

Mod 

Mod 

Sev 

Mod 

Mod 

Mild 

Sev 

Mild 

Sev 

Sev 

Sev 

Mod 

Mod 

Sampling 

date 

6.01 

6.30 

6.11 

6.26 

6.17 

6.11 

6.04 

6.04 

6.15 

6.05 

6.17 

6.11 

6.05 

6.11 

6.17 

6.22 

6.06 

6.11 

6.09 

6.09 

6.07 

6.12 

6.07 

6.12 

6.11 

6.17 

6.12 

6.22 

6.28 

7.01 

7.03 

6.22 

7.01 

6.21 

6.26 

Genomic 

position 

5730 

G 

G 

G 

G 

G 

G 

G 

G 

G 

GC 

GC 

G 

G 

G 

G 

G 

G 

Gg 

G 

G 

G 

G 

G 

G 

Gg 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

5917 

c 

c 

c 

c 

c 

c 

c 

c 

c 

C 

c 

c 

c 

c 

c 

c 

c 

C 

c 

c 

c 

c 

c 

c 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

Tc 

6884 

A 

A, 

A 

Ag 

A» 

Ag 

A 

A, 

A, 

A, 

A 

A 

A 

A 

A 

A 

A 

A. 

Ag 

A 

Ag 

A, 

A, 

A 

A, 

A 

A 

A. 

AG 

GA 

AG 

A 

Ag 

Ag 

A 

7317 

G. 

G, 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G, 

G 

G 

GA 

AG 

G 

G 

G 

G 

G 

G 

G 

G, 

G« 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

8765 

T 

T 

T 

T 

T 

T 

TC 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

Tc 

N 

T 

T 

T 

Tc 

T 

T 

11257 

CT 

Tc 

CT 

Tc 

TC 

C 

c 

TC 

Tc 

TC 

TC 

C 

C 

C 

C 

C 

T 

Tg 

TC 

TC 

Tc 

TC 

Ct 

TC 

C 

T 

C 

TC 

T 

T 

T 

T 

TC 

TC 

T 

17504 

G 

G 

G 

G 

G 

G 

GT 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G, 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

20411 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

T 

21726 

TC 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c. 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

C 

22126 

G 

G 

G 

G 

G 

G 

G 

G 

G 

Gt 

Gt 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

22356 

Ct 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

C 

C 

c. 

TC 

TC 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

22981 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

AC 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

22984 

AG 

A 

GA 

A, 

A, 

A, 

G 

A, 

A 

Ao 

A 

Ag 

Ag 

AG 

GA 

Ag 

A 

A. 

Ag 

Ao 

Ag 

A. 

A 

A 

A 

A 

GA 

Ag 

A 

A 

A 

A 

Ag 

A 

A 

23041 

CT 

c 

TC 

C, 

C. 

C, 

T 

Ct 

Ct 

Ct 

CT 

CT 

Ct 

TC 

Tg 

T. 

c 

Ct 

Ct 

Ct 

Ct 

Ct 

c 

c. 

c. 

c 

TC 

c. 

c 

c 

c 

c 

C, 

c 

c 


Figure 2. Summary of variable sites among 35 samples from 24 patients. At each position, a mixed base is 
displayed and colored if its frequency was >10%. For mixed bases, the font size reflects the frequency of 
that base. Mod, moderate; Sev, severe. 


frequencies of the double mutants (i.e., D510G, I529T), the single mutants (i.e., D51OG, 1529 
and D510,1529T), and the wild type (i.e., D510,1529). On average, the frequency of the dou¬ 
ble mutant was only 1.4% ± 0.3%, whereas the single mutants and the wild type were present 
at 87.7% ± 1.9% and 6.5% ± 1.7%, respectively (Fig. 3). Recently, both D510G and I529T mu¬ 
tations in RBD were shown to reduce its affinity for human CD26 compared with the wild- 
type RBD (Kim et al. 2016c). Although the previous study did not test the binding affinity 
of the double mutant, in our data the two mutations are mutually exclusive, suggesting 
that the double mutant severely impairs viral fitness. Notably, although the frequency of 
each single mutant varied greatly among specimens, the combined frequency of both single 
mutants was consistently high in most samples. At the same time, the frequency of the wild 
type was no more than 10% in most samples (31 of 35 specimens). Thus, our data suggest 
that there was strong selection pressure favoring the variants of the spike glycoprotein 
with reduced affinity for the host receptor. 

Another notable nonsynonymous substitution was Al 07V (at position 11257), which oc¬ 
curred in the nonstructural protein 6 (nsp6) coding region within orflab. This Al 07V variation 
in nsp6 and the I529T substitution in the spike glycoprotein were frequently found in the 
South Korean isolates and appeared to be genetically linked (Fig. 2; Supplemental Table 
S2). Nsp6 is a membrane-spanning protein and is an integral component of the viral replica¬ 
tion complex involved in double-membrane vesicle formation (Lundin et al. 2014). Although 
the functional importance of these MERS-CoV variants in viral replication and their interac¬ 
tion with the host immune system remain to be elucidated, our data showed that the selec¬ 
tion of these variants may not be independent from each other, suggesting the unit of 
selection may be a combination of variants rather than individual variants. 


Park et al. 2016 Cold Spring Harb Mol Case Stud 2: a001214 


7 of 14 
































Downloaded from molecularcasestudies.cshlp.org on February 7, 2019 - Published by Cold Spring Plarbor Laboratory Press 



COLD SPRING HARBOR 

Molecular Case Studies 


Intrapatient heterogeneity of MERS-CoV 


o Wild type 
o Single mutants 
o Double mutant 



Sample 

Figure 3. The fraction of mutant and wild-type alleles at positions 510 and 529 of the spike glycoprotein. To 
measure the frequencies of the mutant and the wild-type alleles, sequencing reads covering positions 510 and 
529 were selected. For each specimen, the read counts supporting each type were divided by the total read 
counts. The frequencies of the double mutants (D510G, I529T), the single mutants (D510G, 1529 and D510, 
I529T), and the wild-type (D510, 1529) alleles are plotted for 35 specimens sorted by sample number. 


DISCUSSION 


In this study, we demonstrated the intrapatient heterogeneity of MERS-CoVs, ruling out the 
possibility of technical noise. Based on the statistical test described above, we first selected 
16 non-consensus variant candidates with frequencies significantly higher than those from 
technical noise. Among those variant candidates, we tested nine candidates by Sanger se¬ 
quencing and validated seven variants, resulting in a 78% validation rate. After removal of 
the two nonvalidated candidates, we ultimately listed a total of 14 non-consensus variants, 
expecting an 89% validation rate. There was a tight correlation between the frequencies of 
the two variants at positions 22984 and 23041. These results suggest that technical artifacts 
were highly unlikely to have generated such non-consensus variants. Our analysis was limit¬ 
ed to relatively high-frequency variants because the sequencing error rate presents certain 
limitations for the study of RNA as opposed to DNA viruses. Technical advances such as the 
use of unique molecular identifiers to remove PCR errors may produce more accurate de¬ 
scriptions of MERS-CoV populations in patients (Kinde et al. 2011). Despite the limitations, 
we demonstrated the presence of heterogeneous MERS-CoV populations in clinical samples 
from MERS-CoV-infected patients. 

Given the large number of cases that resulted from exposure to Patient 14, it was intrigu¬ 
ing that Patient 14 displayed the highest intrapatient heterogeneity. Among the 10 single¬ 
nucleotide variants that were observed in more than one patient, six variants were present 
at frequencies >0% in Patient 14 (Fig. 2). Thus, the majority of intra- and interpatient hetero¬ 
geneity observed in different patients seemed to originate from the variants present in 
Patient 14. Our results showed that intrapatient heterogeneity can be transmitted in some 
cases, indicating that humans can be simultaneously infected with more than one MERS- 
CoV. 

Because most of the mixed bases observed in Patient 14 were not completely fixed in the 
subsequent generation of cases, the single-nucleotide variants might not have had signifi¬ 
cantly higher fitness than wild type. For example, the sequence from Patient 14 had a mixed 
base C/T at position 11257. Among sequences from patients exposed to Patient 14, a subset 
displayed either C or T alone with little intrapatient heterogeneity, whereas others showed 
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similar intrapatient heterogeneity to Patient 14. In addition, we could not find any significant 
differences in genetic variants based on disease severity group (Supplemental Tables S3 
and S4). Taken together, genetic variant composition in each patient varied, regardless of 
the transmissibility or disease severity, suggesting that transmission of individual genetic 
sequences was stochastic rather than selective. Even if the genetic variants had a selective 
advantage, the advantage of individual genetic sequences might have been weak and/or 
varied depending on the patient. 

Although we could not provide a clear picture of the functional interactions among dis¬ 
tinct genetic variants in the population, the high ratio of nonsynonymous to synonymous 
substitutions offered a clue to their functional impact. The 14 variants consisted of 12 non¬ 
synonymous and two synonymous substitutions. Assuming that the unit of selection might 
not be an individual variant but the population as a whole, we calculated dN (the number 
of nonsynonymous changes per nonsynonymous site) and d s (the number of synonymous 
changes per synonymous site) by the Pamilo-Bianchi-Li method (Li 1993; Pamilo and 
Bianchi 1993). The d^/ds ratio was 1.03, hinting at the possibility of positive selection. 
Notably, six of the 12 nonsynonymous substitutions were found in the spike glycoprotein¬ 
coding region. As quasispecies theory proposes, the unit of selection might not be an indi¬ 
vidual virus but the population as a whole. Our data showed that the frequencies of the sin¬ 
gle mutants (D510G, 1529 and D510, I529T) significantly fluctuated among specimens, but 
the combined frequency of the single mutants was consistently high in most samples, sug¬ 
gesting a combination of variants as the unit of selection. 

Recently, in an analysis of 13 MERS-CoV genomes associated with the 2015 outbreak in 
South Korea, Kim et al. (2016c) reported that 11 of those genomes had an I529T mutation in 
RBD, and one had a D510G mutation. The study showed that D510G and I529T mutations 
resulted in reduced affinity of RBD for the human CD26 receptor compared with the wild- 
type RBD. Based on the spread of a mutant MERS-CoV with reduced affinity for the receptor, 
the authors suggested that MERS-CoV adaptation during human-to-human spread might be 
driven by host immunological pressure such as neutralizing antibodies (Kim et al. 2016c). 
Consistent with the previous analysis, our data showed that the I529T and D51OG mutations 
were observed in the consensus sequences from 29 and 4 of 35 samples, respectively. 
Furthermore, our analysis at an intrapatient level showed that the frequency of the wild 
type at both positions was only 6.5% ± 1.7% on average, supporting a strong selection pres¬ 
sure favoring the variants over the wild type. 

Thus, we questioned whether the selection pressure was exerted by a host immune re¬ 
sponse such as neutralizing antibodies, as previously suggested (Kim et al. 2016c). In such 
cases, a reduction in host immune pressure might increase the frequency of the wild type. 
Four specimens displaying relatively high frequency (i.e., >10%) of the wild type belonged 
to Patients 77 and 80. We noticed dramatic changes in routine blood test results such as 
white blood cell (WBC) counts and C-reactive protein (CRP) levels in these patients. We 
analyzed 19 serial specimens from eight patients. For this, we used normalized WBC count 
values expressed as a percentage of the first of a series of samples from each patient. Serial 
samples from Patients 77 and 80 displayed a dramatic decrease in the normalized WBC 
count and a simultaneous increase in the frequency of the wild-type allele (Supplemental 
Fig. S4). Whereas WBC counts significantly decreased in those patients, the CRP level in¬ 
creased during the period of MERS-CoV infection, indicating an impaired immune response. 
Although these data are limited owing to the small sample size and the lack of direct mea¬ 
surement of host immunological pressure, our results suggest that the selection pressure ex¬ 
erted by the host immune response might favor variants with reduced affinity to the host 
receptor; therefore, a reduction in this selection pressure resulted in the expansion of viruses 
with the wild-type allele. The characterization and quantification of neutralizing antibodies in 
patients overtime is required to determine their association with the mutants and to validate 
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the hypothesis. Moreover, more accurate descriptions of MERS-CoV populations within pa¬ 
tients will advance our understanding of the complex molecular evolution of MERS-CoVs. 

Because the evolution of a virus is a continuous process that takes place during intrapa¬ 
tient infection and interpatient transmission, the evolution of MERS-CoVs should be studied 
at both intra- and interpatient levelsto obtain a better understanding of the principles under¬ 
lying the complex molecular evolution of MERS-CoVs in natural populations. However, our 
knowledge of MERS-CoV evolution primarily depends on analysis at the interpatient level, 
ignoring genetic diversity within individual patients. In this study, we demonstrated intrapa¬ 
tient heterogeneity in human MERS-CoV isolates. Based on the analyses of the genetic diver¬ 
sity of MERS-CoVs at both the intra- and interpatient levels, our results shed light on the 
evolutionary dynamics of MERS-CoVs associated with the South Korean outbreak. 


METHODS 


Collection of Clinical Specimens 

A total of 35 clinical specimens from 24 patients were determined positive for MERS-CoV 
and were included in this study. Clinical information and information about possible expo¬ 
sure to other MERS-CoV patients was collected from the electronic medical records and pub¬ 
licly available data from multiple sources including the South Korea Centers for Disease 
Control and Prevention and the South Korea Ministry of Health and Welfare. These data in¬ 
cluded age, gender, epidemiologic link, dates of suspected exposure to MERS patients, ini¬ 
tial symptoms, date of symptom onset, and clinical courses. 

MERS-CoV Real-Time Reverse Transcription PCR Assays 

RNA was extracted from clinical specimens using either a QIAamp DSP Viral RNA Mini Kit 
(Catalog No. 61904, QIAGEN GmbH) or an automated MagNAPure 96 extraction instrument 
with a total nucleic acid isolation kit (Roche). The extraction was performed according to the 
manufacturers' instructions and the extracted RNA was stored at —70°C. 

MERS-CoVs were detected in specimens using real-time reverse transcription (rRT)-PCR. 
Extracted RNAs were screened by targeting the upE region, and positive results were con¬ 
firmed by a subsequent amplification of orfla using a PowerChek MERS Real-Time PCR kit 
(Kogene Biotech, Seoul, South Korea). All rRT-PCR reactions were performed using the 7500 
Fast Real-Time PCR System (Applied Biosystems) with a total reaction volume of 20 pL (15 pL 
of PCR reaction mixture and 5 pL of template RNA). The thermocycling conditions included a 
reverse transcription reaction for 30 min at 50°C, followed by 10 min at 95°C, and then 40 
cycles of 15 sec at 95°C and 60 sec at 60°C. A positive viral template control and a nontem¬ 
plate control were included in each run. The glyceraldehyde-3-phosphate dehydrogenase 
( GAPDH ) gene was amplified simultaneously as an internal control to monitor PCR inhibition. 
A positive result was identified by a well-defined exponential fluorescence curve that crossed 
the defined threshold at <35 cycles in both the upE and ORFIa assays. 

Reverse Transcription and PCR Amplification for Sequencing 

Five microliters of viral RNA from each sample was used as a template for cDNA synthesis 
using a Superscript III First-Strand Synthesis System (Life Technologies). Equal amounts of 
cDNA product were used to perform PCR with a Herculase II Fusion DNA polymerase 
(Agilent Technologies). For full-genome sequencing, primers described by Cotten et al. 
(2013a) were used with minor modifications for efficient amplification. A set of 60 primers 
was used to generate fifteen 2.5-kb overlapping amplicons (four primers per amplicon) 
and three additional primers were added for the extreme termini of the genome 
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(Supplemental Table S5). Primers used for the targeted sequencing are listed in 
Supplemental Table S6. Primers and nucleotides were removed from the final PCR products 
using a MiniElute PCR purification kit (QIAGEN). 


Sequencing 

The PCR products from each patient were combined into a pool of approximately equal mo¬ 
larity and subjected to paired-end library construction with a TruSeq Nano DNA Sample 
prep kit (lllumina). Libraries were sequenced using an lllumina MiSeq sequencer (IIlumina), 
generating 150-bp paired-end reads. On average, 4113* and 9361 x raw read depths 
were achieved for full-genome and targeted deep sequencing, respectively. Sequencing 
metrics including mean depth of coverage of each sample are summarized in Supplemental 
Table S7. Bases were called with MiSeq reporter software (ver. 2.4.60). The reads were 
quality filtered using the command "fastq_qua Iity_fiIter," which required a minimum of 
90% bases with a quality score of 20 or higher. Paired-end reads were aligned with the 
National Center for Biotechnology Information (NCBI) MERS-CoV reference sequence 
(NC_019843.3) using the Burrows-Wheeler Aligner (BWA) (Li and Durbin 2010). SAMtools 
vO.1.19 (Li et al. 2009), GATK v2.4-7 (McKenna et al. 2010), and Picard vl.93 were used 
for sorting SAM/BAM files, local realignment, and duplicate markings, respectively. We 
used the SAMtools "mpileup" command to determine the read depth. The consensus se¬ 
quence was determined from the most commonly expressed nucleotide at each position. 


Phylogenetic Analysis 

We downloaded a total of 105 human MERS-CoV genome sequences from the NCBI 
database (http://www.ncbi.nlm.nih.gov/genomesA/irusVariation/Database/nph-select.cgi? 
cmd=database&taxid=1335626). Combining the eight MERS-CoV genome sequences 
from our study with the downloaded sequences, a phylogenetic analysis was performed after 
aligning the sequences using MUSCLE (http://drive5.com/muscle). A total of 30,130 aligned 
nucleotides were input into MEGA 7.0 (http://www.megasoftware.net) (Tamura et al. 2013), 
and a phylogenetic tree was constructed using the neighbor-joining method. The GenBank 
accession number foil owed by a brief description of the sequence, including the virus variant 
name and the country of isolation, is provided for each sequence in Supplemental Figure S2. 
Evolutionary distances were computed using the Kimura two-parameter model. The scale 
bar at the bottom of the figure indicates nucleotide substitutions per site. 


Determination of Technical Noise and Significance Test for Variants 

We used X^to denote the allele frequency for the ith sample, the jth chromosomal position, 
the base (A, T, G, or C), and the rth replicated experiment. If the total depth was <100x at a 
given ijr, we treated it as missing, irrespective of the b value. We defined the background 
noise value (N^ as the average of the absolute differences between replicated data for 
all combinations: 

M _ ELi E;<fc I x ijbk - Xjjbil lid = 1, if k < I, 

' Jb ELi ELi <ki ’ 1 ki = 0, otherwise. 

Each variant allele frequency was tested by Z with a mean of 

Ejb ~ (5Z /=1 Nijb)/ n 
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and a variance of 

V/b=(ELi( N Sb- E /b) 2 )/ (n -1 ) - 

when replicates were available for only n samples. 


ADDITIONAL INFORMATION 


Data Deposition and Access 

The seven full-length and one partial virus genomes were deposited at GenBank (http:// 
www.ncbi.nlm.nih.gov/genbank/) under accession numbers KX034093-KX034100. Raw 
sequencing data were deposited in the Sequence Read Archive (SRA; http://www.ncbi. 
nlm.nih.gov/sra) under accession number SRP072910. 
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